1 Overview
FungiExpresZ is a browser based user interface (develoed in R-shiny) to analyse and visualize gene expression data. It allows users to visualize their own gene expression data as well as more than 13,000 pre processed SRA fungal gene expression data. Users can even merge their data with SRA data to perform combined analysis and visualizations. Just uploading gene expression matrix (.txt file where rows are genes and column are samples), users can generate 12 different exploratory visualizations and 6 different GO visualizations. Optionally, users can upload multiple gene groups and sample groups to compare between them. Users can select set of genes directly from one of the scatter plot, line plot or heatmap and pass them for GO analysis and GO visualizations. GO analysis and GO visualizations have been implemented through popular R package ClusterProfiler (Yu et al. 2012) and can be done for more than 100 different fungal species.
2 Key features
2.1 More than 13,000 NCBI-SRA data from 8 different fungal species
FungiExpresZ provides normalized gene expression values (FPKM) for more than 13,000 SRA samples. User can select one or more SRA samples for visualizations. Data can be searched based on species, genotype, strain or free text which will be matched against several SRA columns.
| species | #sra_samples |
|---|---|
| Aspergillus nidulans FGSC A4 | 151 |
| Candida albicans SC5314 | 639 |
| Saccharomyces cerevisiae | 11872 |
| Aspergillus fumigatus Af293 | 242 |
| Aspergillus niger CBS 513.88 | 253 |
| Candida glabrata CBS 138 | 126 |
| Talaromyces marneffei ATCC 18224 | 26 |
| Candida auris B8 441 | 46 |
NOTE: We are continuously processing fungal SRA data. This table will be updated as we add new data.
2.2 visualize gene expressiondata with or without integration of SRA data
Users can analyze and visualize their own data by uploading .txt/.csv file (columns are samples and rows are genes). Optionally, user data can be integrated with selected SRA data for combined analysis and visualization.
2.3 Visualize multiple gene groups and sample groups in a single plot
Optionally, user can upload sample groups (e.g. replicates, control vs treatment, wild type vs deletion etc.) and multiple gene groups to compare between them. Group information uploaded once, can be used across several plots against fill and facet plot attributes to make more complex visualizations.
2.4 Twelve different data exploratory visualizations
FungiExpresZ provides browser based user friendly interface, which allows users to generate ggplot2 based 12 different publication-ready elegant visualizations. Users are allowed to adjust several common plot attributes such as plot title, axis title, font size, plot theme, legend size, legend position etc. and few other plot specific attributes. Currently, available plots are …
- Scatter Plot
- Multi-Scatter Plot
- Corr Heat Box
- Density Plot
- Histogram
- Joy Plot
- Box Plot
- Violin Plot
- Bar Plot
- PCA Plot
- Line Plot
- Heatmap
2.5 Supports Gene Ontology (GO) enrichment and visualizations for more than 100 different fungal species
FungiExpresZ allow users to define gene-set(s) directly from plot (Scatter plot, Line plot and Heatmap) to perform gene ontology enrichment and visualizations. Available GO visualizations are …
- Emap plot
- Cnet plot
- Dot plot
- Bar plot
- Heat plot
- Upset plot
3 Installation
FungiExpresZ can also be installed locally as an R package or docker image. Please follow the instructions given on github to install on local computer.
4 Example data
We have used cartoon gene expression data to generate plots given in this document.
4.1 Expression matrix
// TO DO : ADD DOWNLOAD LINKS
Expression matrix can be downloaded from the file given here. It contains 4 samples each with 3 replicates. Column names and their description have been given in the table below.
| Column names | Description |
|---|---|
| gene_id | Gene id, unique to each row |
| Control_Rep.A | Normalised FPKM values |
| Control_Rep.B | Normalised FPKM values |
| Control_Rep.C | Normalised FPKM values |
| Treat1_Rep.A | Normalised FPKM values |
| Treat1_Rep.B | Normalised FPKM values |
| Treat1_Rep.C | Normalised FPKM values |
| Treat2_Rep.A | Normalised FPKM values |
| Treat2_Rep.B | Normalised FPKM values |
| Treat2_Rep.C | Normalised FPKM values |
| Treat3_Rep.A | Normalised FPKM values |
| Treat3_Rep.B | Normalised FPKM values |
| Treat3_Rep.C | Normalised FPKM values |
| Control_Mean | Mean FPKM of control replicates A,B,C |
| Treat1_Mean | Mean FPKM of Treatment1 replicates A,B,C |
| Treat1_Mean | Mean FPKM of Treatment2 replicates A,B,C |
| Treat1_Mean | Mean FPKM of Treatment3 replicates A,B,C |
| fc_treat1 | log2FC(Treat1_Mean/Control1_Mean) |
| fc_treat2 | log2FC(Treat2_Mean/Control2_Mean) |
| fc_treat3 | log2FC(Treat3_Mean/Control3_Mean) |
4.2 Sample groups
// TO DO : ADD DOWNLOAD LINKS
Sample group file contains two columns.
| Columns | Description |
|---|---|
| group_name | User given name to each sample (column) group. Values in this column can be redundant. |
| group_members | Values in this column must be from column names given as sample identity in the expression matrix file. Each value must be unique in this column. |
In here, we have grouped samples by replicates. File can be downloaded from here.
4.3 Gene groups
Gene group file contains two columns.
| Columns | Description |
|---|---|
| group_name | User given name to each gene (row) group. Values in this column can be redundant. |
| group_members | Values in this column must be from the first column given as row identity in the expression matrix file. Each value must be unique in this column. |
In here, we have grouped genes by
- Fold change comparison - Treatment(1,2 or 3)/Control
- Three groups in each comparison
- UP
- DOWN
- NC
- Fold change status in two different comparisons
- Nine groups in each catagory
- UP_UP
- UP_DOWN
- UP_NC
- DOWN_DOWN
- DOWN_UP
- DOWN_NC
- NC_NC
- NC_UP
- NC_DOWN
Gene group files can be downloaded from the links given below.
// TO DO : ADD DOWNLOAD LINKS
| Gene group files | Description |
|---|---|
| gene group file 1 | gene groups by fold change Treat1/control |
| gene group file 2 | gene groups by fold change Treat2/control |
| gene group file 3 | gene groups by fold change Treat3/control |
| gene group file 4 | gene groups by fold status Treat1/control vs Treat2/control |
| gene group file 5 | gene groups by fold status Treat2/control vs Treat3/control |
| gene group file 6 | gene groups by fold status Treat1/control vs Treat3/control |
5 Exploratory example plots
By uploading data files (given above) to the FungiExpresZ, plots below can be generated.
5.1 Scatter plot
Scatter plot can be used to display pairwise correlation between 2 samples. User can color dots either by density (default) or gene groups.
Scatter plot: dots color by density (left) and color by gene groups (right)
5.2 Multi-Scatter plot
Multi-scatter plot can be used to display pairwise correlation between more than 2 samples (Recommanded to show correlation between replicate samples). The lower half of the plot represents scatter plot while upper half represents correlation values. Plot diagonal displays distribution of each sample in form of density plot. As the sample number increases, total number of plots increasese exponentially in a single graphical device, which makes image crowdy and less interpretable. Therefore, we restrict user to include maximum 5 samples in one multi-scatter plot. Correlation heat-box is an alternative to show correlation in form of heat map for more than 5 samples.
Multi scatter plot: pairwise correlation between replicate pairs
5.3 CorrHeatBox
CorrHeatBox is useful to display pairwise correlation in form of heatmap.
Correlation heatbox: represented as sqare (left) and circle (right)
Correlation heatbox: represented as upper half (left) and lower half (right)
5.4 Density plot
Density plot can be used to display distribution of individual sample, sample groups or gene groups.
Density plot: distribution of single sample single gene group (left) and multiple samples single gene group (right)
Density plot: distribution of single sample multiple gene groups (left) and multiple samples multiple gene groups (right).
5.5 Histogram
Histogram can be used to display frequency count of individual sample, sample groups or gene groups.
Histogram: frequency of single sample single gene group (left) and multiple samples single gene group (right)
Histogram: frequency of single sample multiple gene groups (left) and multiple samples multiple gene groups (right)
5.6 Joy plot
Joy plot can be used to display distribution of individual sample, sample groups or gene groups. By separating muliple variables on Y axis, it overcome the limitation of normal density plot.
Joy plot: multiple samples single gene group color by probability (left) and color by quantile (right)
Joy plot: multiple sample groups single gene group (left) and multiple samples multiple gene groups (right)
5.7 Box plot
Boxplot can be used to display distribution of each observation and quantiles from individual sample, sample groups or gene groups.
Box plot: multiple samples colored by samples (left) and colored by sample groups (right)
Box plot: multiple samples multiple sample groups (left) and multiple samples multiple gene groups (right)
5.8 Violin plot
Similar to box plot, violin plot, can be used to display distribution of each observation and quantiles from individual sample, sample groups or gene groups.
Violin plot: multiple samples colored by samples (left) and colored by sample groups (right)
Violin plot: multiple samples multiple sample groups (left) and multiple samples multiple gene groups (right)
5.9 Bar plot
Bar plot can be used to display expression of individual genes across multiple samples, sample groups and gene groups.
Bar plot: expression of individual genes across samples colors by genes (left) and colors by genes and faceted by sample groups (right)
5.10 PCA plot
PCA plot can be used to display similarity and differences between samples and sample groups using principle components
PCA plot : color by sample groups (left) and color by k-means (right)
5.11 Line plot
Line plot can be used to display genes’ trend across multiple samples. User can group observations either by k-means or pre defined gene groups.
Line plot : k-means clusters individual gene (left) and cluster mean (right)
5.12 Heatmap
Heatmap can be used to display genes’ trend across multiple samples. User can group genes and samples either by k-means or pre defined gene groups or sample groups.
Heatmap: row clusters by k-means (left) and row clusters by gene groups (right)
Heatmap: along with column box plot on top (left) and parallel row standard deviation heatmap (right)
Heatmap: row clusters sorted by standard deviation (left) and columns clustered by sample groups (right)
6 GO example plots
User can select genes or gene clusters from one of the scatter plot, lineplot or heatmap and pass them to GO enrichment followed by GO visualizations. If data uploaded by user, geneIds (first column of the file) must match with the geneIds of the selected species.
6.1 GO dotplot
GO dotplot
6.2 GO barplot
GO barplot
6.3 GO heatplot
GO heatplot
6.4 GO emapplot
GO emapplot
6.5 GO cnetplot
GO cnetplot
6.6 GO upsetplot
GO upsetplot
References
Yu, Guangchuang, Li-Gen Wang, Yanyan Han, and Qing-Yu He. 2012. “clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters.” OMICS: A Journal of Integrative Biology. https://doi.org/10.1089/omi.2011.0118.